"An APK Framework for Addressing
Spatial Granularity Issues in NDVI Crop
Clustering"
//Abstract (Incomplete, 132 words): Crop yield analysis is an arduous task due to the crop’s
reliance on multiple factors such as weather, soil, temperature, precipitation, etc. Recent studies
[citations here] show that satellite image analysis is a potent method to analyze crop yield.
Through the image databases available from multiple satellites, along with other factors affecting
crop yield, we have devised a novel method for crop yield analysis. Our approach relies on
databases from two identical satellites, Sentinel-2A & Sentinel-2B. The data fetched from the
corresponding satellite is fed into QGIS software, which processes the NDVI of the respected
images. In the next step, we applied a DBSCAN to the fetched NDVI image, and simultaneously,
other features affecting crop yield were integrated. This process culminated in the data being
ready for the selected model which is CNN-RNN. CNN-RNN provides a way to determine the
underlying patterns in crop image over the period of time [citation here]. …rest to be written
after model deployement…..//
Introduction:
Crop yield analysis is crucial for agricultural monitoring and pre-harvest yield estimation,
playing a significant role in ensuring food security. Recent advancements in machine learning
and remote sensing technologies have enhanced the accuracy and efficiency of crop yield
prediction.
Van Kloppenburg (2020) provides a comprehensive review of machine learning approaches in
crop yield prediction, detailing key models, methodologies, and applications in agriculture. The
study also discusses prevailing challenges, emerging trends, and future directions aimed at
improving forecasting accuracy and scalability. Similarly, Ferral (2003) explores the application
of remote sensing technologies to address environmental and societal challenges, including
resource management and climate change.
Deep learning models have also been leveraged to optimize agricultural decision-making.
Elavarasan (2020) investigates the use of deep reinforcement learning models to predict crop
yields, demonstrating their potential to enhance sustainability by improving farming operations
and resource management. In a broader review, Condran (2022) examines the evolution of
machine learning applications in precision agriculture over the past two decades, focusing on key
trends, major applications, and performance evaluations in crop management, yield prediction,
and resource efficiency.
Recent studies have also emphasized the integration of remote sensing data with advanced
computational techniques. Lu (2023) introduces a multi-scale feature fusion semantic
segmentation model for crop classification using high-resolution remote sensing imagery. By
combining spatial and spectral data, the model improves classification accuracy and efficiently
differentiates between crop types. Sayago (2018) presents a similar approach, leveraging multi-
scale feature fusion to enhance precision in crop categorization.
Satellite imagery has been a fundamental component in agricultural forecasting. Sabini (2017)
explores the potential of satellite-based data to improve crop production predictions, particularly
by integrating machine learning techniques with remote sensing information. In a more recent
study, Olisah (2024) develops a deep neural network model for corn production forecasting,
aimed at assisting smallholder farmers in making informed decisions. By improving yield
forecasting accuracy, the model facilitates effective resource allocation and sustainable
agricultural planning.
Accurate crop yield prediction relies on the effective classification and clustering of agricultural
fields using remote sensing data. While machine learning and deep learning models have
significantly improved yield forecasting, challenges arise when dealing with small-scale
farmlands where spatial granularity is reduced. In remote sensing, clustering crop fields is
relatively straightforward for large agricultural plots due to their distinct spectral and spatial
characteristics. However, traditional clustering techniques face challenges in fragmented or
smallholder farming regions to distinguish between individual fields, thus leading to errors in
classification and reduced model accuracy.
Our work addresses this challenge by developing a clustering approach specific to small-scale
agricultural landscapes. Rather than using K-means or DBSCAN that tend to break in high-
granularity settings, we adopt the use of more advanced methods integrating multi-scale feature
extraction from remote sensing data. Using Sentinel-2 imagery, we hope to apply the benefits of
multi-modal deep learning frameworks to further improve the separation of closely located crop
fields. We also examine hybrid methods of clustering that combine spatial and spectral
information to enhance the precision in segmenting smallholder farm regions.
One of the prime features of our approach is applying clustering techniques depending on
varying sizes of fields dynamically by adjusting granularity in features. Large crop fields have
uniform responses in terms of spectral response but smaller fields can have mixed signals due to
different crop types near them, changes in soil characteristics, and sometimes irrigation patterns
also. For such cases, hierarchical clustering methods will be implemented wherein feature
extraction improves at various scales of spatial observation so that the smallest plots would be
correctly classified.
We also discuss the integration of CNN-based segmentation models with clustering algorithms,
which will allow for a more precise crop classification. The traditional unsupervised clustering
methods fail to account for spectral inconsistencies in small fields, but deep learning models can
learn contextual dependencies that enhance the accuracy of classification. Our proposed
methodology improves crop yield estimation in regions with smallholder farms and contributes
to better resource allocation, policy-making, and food security strategies.
Our research bridges the gap between the capabilities of remote sensing and real-world needs for
agriculture, thus addressing the challenge of clustering small-scale agricultural fields. This work
has huge potential for the betterment of precision agriculture by increasing the accuracy in crop
health evaluation and by contributing to sustainable agriculture through better monitoring using
satellites. By increasing yield forecasting accuracy, the model hopes to facilitate more effective
resource management and sustainable farming planning. Long-term planning can also be
initiated by taking into account the crop yield analysis. It takes into account the satellite data
from Sentinel 2. Sentinel-2 is the part of Copernicus Program launched by European space
agency in 2014 that acquires optical imagery at high spatial resolution (10m to 60m) over land.
Sentinel 2 comprises of two identical satellites namely, Sentinel 2A and Sentinel 2B. The spatial
resolution is a measure used in context of remote sensing and refers to the size of the smallest
feature that can be detected by a satellite sensor or displayed in a satellite image. It’s exhibited as
a single value that represents the length of one side of a square. For example, a spatial resolution
of 50m means that one pixel represents an area 50 by 50 meters on the ground. There are myriad
no of satellite programs launched by respective agencies that offer different levels of spatial
resolution. Choosing Sentinel-2 for crop yield prediction offers edge due to its unique
characteristics. Reasons to choose the program are listed below.
1. High Spatial Resolution: Sentinel-2 provides high spatial resolution imagery of about 10-20
meters along with recurrent revisit time of 5 days, giving the ability to monitor crop health at a
field level with high precision.
2. NDVI and Vegetation Health Monitoring: Sentinel-2 is equipped with spectral bands needed
to calculate NDVI and other vegetation indices, which are critical for monitoring crop health and
biomassa direct indicator of yield potential.
3. Multi-Spectral Capabilities Sentinel-2 has a total of 13 spectral bands, including specific
bands for red edge and short-wave infrared, which are highly sensitive to chlorophyll content and
water stress in vegetation.
4. Periodic Data Availability: Sentinel-2 provides continuous, high-resolution data from 2015
onward, allowing recent analyses and insights into current agricultural practices.
Spatial resolutions
Low Medium High
FIGURE 1
Image Generated from AI*
AI-Based Image Generation from Text*
AI-Based Image Generation from Text *
AI-based image generation translates text prompts into images using advanced deep learning models. The
process involves Natural Language Processing (NLP), diffusion models, and post-processing techniques
to create high-quality visuals.
1. Understanding Text Input (NLP)
The process starts with Natural Language Processing (NLP), where the AI interprets the user's
text prompt. Models like CLIP (Contrastive Language-Image Pretraining) map text and images
into a shared space, ensuring accurate visual representations of the described content.
2. Diffusion Models (Core of Image Generation)
Modern AI-generated images primarily rely on diffusion models (e.g., DALL·E 2, Stable
Diffusion, Midjourney), following a three-step process:
• Random noise initialization – The model begins with a completely noisy image.
• Step-by-step denoising The AI refines the image gradually to match the given text.
• Final image formation – The noise is removed progressively, resulting in a coherent and
high-resolution image.
Diffusion models surpass other methods in generating realistic, high-quality, and detailed
images, making them the preferred approach.
3. Refinement and Post-Processing
After generating an initial image, AI models apply enhancements such as:
• Super-resolution Increases image clarity and sharpness.
• Fine-tuning Incorporates user feedback for better accuracy.
• Style adaptation – Adjusts artistic elements based on context or preference.
4. Conclusion
AI-driven text-to-image generation integrates NLP, diffusion models, and post-processing to
create high-quality visuals. Diffusion models have emerged as the leading technique, offering
superior realism and detail. As AI research progresses, these models will continue evolving,
broadening their application across various industries.
The data retrieved from the satellites is raw and multispectral in nature. Multispectral imaging is
imaging that uses bands from the complete electromagnetic spectrum. This is where NDVI
comes into play. NDVI stands for Normalized Difference Vegetation Index and is a metric that
has the capacity to measure different aspects of a plant growth. NDVI relies upon the interaction
of plants with different wavelengths of sunlight, especially in the visible red and near-infrared
(NIR) regions of the electromagnetic spectrum. The plants that are healthy tend to absorb most of
the visible red light for photosynthesis but reflect a lot of NIR light. Converse is seen in
unhealthy plants or area of sparse vegetation where red light is reflected more than NIR. In
Sentinel-2, B4 band refers to Red band and B8 spectral band refers NIR. The formula for NDVI
is
NDVI = (NIR-RED) / (NIR+RED)
In this equation:
NIR: The reflectance value in near-infrared spectrum (typically around 750-900 nm)
Red: The reflectance value in the visible red spectrum (around 620-750nm)
The equation results in value that is between -1 and +1:
If the values are near to +1, it indicated dense and healthy vegetation
If the values are near 0, it indicated barren areas or non vegetated surfaces, like soil,
barren land, urban settlements, rocks etc.
If the values are less than 0, it indicates water bodies or areas with zero vegetation.
Satellite Image NDVI
FIGURE 2
Satellite Image of Patiala, Punjab, India and its respective NDVI
FIGURE 3
Types of Non NDVI Signatures on NDVI map (RGB).
The next step is to analyze the respective satellite imagery using Geographic information system
software, i.e. QGIS. QGIS or Quantum Geographic Information System which is an open source
GIS or geographic information system software that allows users to analyze the satellite imagery
for their respective purposes. QGIS provides the capacity to process the fore mentioned NDVI.
[1] were successful in crop yield prediction as the fields in their dataset are much more vast and
structured than in India. Size of area of interest is much larger in the case mentioned. Also
clustering is easy on labeled data. Indian crop fields present significant challenges in the similar
approach. First the crop division in the Indian fields is much more scattered. Second the field
size is quite small in India. These factors lead to relatively lower resolution images (not the
actual resolution but lower resolution in sense that images contain much more area of interest
than Chinese fields) and lower relative granularity spatial resolution. In the context of
satellite images, image granularity refers to the level of detail or the fineness of features that the
image can capture. It is influenced by the spatial resolution, spectral resolution, and the specific
application for which the image is being analyzed. It can be said image granularity refers to the
level of meticulousness in an image.
FIGURE 4
Satellite images from Jixi, China at 200m above ground, showing larger and more structured fields than
Figure 5
FIGURE 5
Satellite image of Guna district, Madhya Pradesh, India. Smaller and loosely structure fields can be seen
at 200m above ground.
Above mentioned are the main challenges for crop yield prediction using satellite data in India.
SVM model was applied to the image but it is not optimal to capture special features of NDVI
and was slightly better than random clustering, standing at an ARI score of 0.54. The other
approach is DBSCAN. DBSCAN is a good algorithm for detecting outliers but has very high
computational load due to its design and leads to a time complexity of O(n2). When applied on a
district level data, it leads to issue of memory error constantly. So, finally Deep learning based
algorithm Autoencoder PCA K Means framework was finally selected. PCA stands for Principal
Component Analysis (PCA) and is a technique in machine learning for dimensionality reduction
and feature extraction. High-dimensional data is transformed into a lower-dimensional form
while retaining variability (as much as possible) in PCA.
Method: The dataset for this paper is chosen from Google Earth Engine as Google Earth Engine
can provide dataset for a particular coordinate, band and time. Google Earth Engine also
provides with the ability to select cloud cover. Cloud cover refers to the percentage of an image
area that is blocked by clouds. Cloud cover information is important for remote sensing analysis
because clouds block the surface features of interest, making it difficult to interpret data
accurately. Google Earth Engine takes raw images from the twin satellites of the Sentinel 2
program and stitches them according to the specifications mentioned by the user. It provides with
the ability to select a particular district and area in form of a polygon. Further it filters out
specified cloud cover in the specified dates entered. The main feature of dataset from Sentinel 2
is the fore mentioned NDVI and various spectral bands which are:
B4 (Red - 665nm): Indicates plant health; essential for vegetation indices like NDVI.
B8 (NIR - 842nm): It’s highly reflective for healthy vegetation, essential for vegetation indices
The dataset is taken from Guna district of Indian state of Madhya Pradesh [Citation here]. Major
crops grown in Rabi Season in the district are Wheat: 91,800 hectares and Gram: 83,700 hectares
. Wheat is sown around 25 November and Gram is sown around 15 October. The peak NDVI of
Gram is around 57 days from sowing and of wheat is around 75 days from sowing [2].
FIGURE 6
Location of Guna District in India
FIGURE 7
Fields in Guna District
FIGURE 8
Wheat and gram fields in Guna district.
FIGURE 9
Showing Difference in peak values of NDVI of Wheat and Gram
(Mandal, U. K., Sarma, K. S., Victor, U. S., & Rao, N. H. (2002). Profile water balance
model under irrigated and rainfed systems. Agronomy Journal, 94(5), 1204-1211.)
FIGURE 10
NDVI clustered image of Fig.5
To overcome the challenges of Relative granularity resolution, there is a need to apply Deep
learning algorithm to extract underlying features from the image.
To commence, image is downloaded in tiff format which exists in grayscale. The image is pre
processed in QGIS to generate NDVI. QGIS converts the grayscale images to Red, Yellow and
Green or RYG color format and thus NDVI image is generated.
The model used is Autoencoder K Means Framework to extract detailed information from low
resolution images. The NDVI image is entered into the Autoencoder that extracts the features i.e.
the targeted crop and then K Means clustering is applied to the image to group together the
refined clustered crop type image. The code leverages autoencoders, PCA and K Means to create
an iterative clustering pipeline.
Rasterio library of Python was used to load the NDVI tiff files and preprocessing was applied to
normalize the data. The data was further transposed to bring the image into correct shape.
An Autoencoder network was created using TensorFlow’s Keras API. After input layer
encoding, layer with ReLU activation was instantiated and a decoding layer was defined with
Sigmoid activation function. The autoencoder compressed the input data into a lower
dimensional feature space by learning efficient encoding, this step extracts key features from the
input data that represent its essential characteristics while discarding noise. This step is essential
as to extract important features from low Relative granularity resolution image. Then later the
compressed image was decoded back into original dimensions with extracted features.
The model was iterated for 10 times for better learning of most importnt features by : learning
patterns in data , comprecessing data effeciently each time , reconstrusting (decoding) image
more accurately , learning the weights by updating them during traning to minimize the
difference between between original input and reconstructed output using loss function
MSE.Optimizer used was Adam.
Encoder (ReLU Activation):
The encoder transforms the input xRn\mathbf{x} \in \mathbb{R}^nxRn to a latent representation
zRm\mathbf{z} \in \mathbb{R}^mzRm, using the ReLU activation function.

󰇛
󰇛
󰇜
Where:


is the weight matrix for the encoder.

is the bias vector for the encoder.

󰇛
󰇜

󰇛

󰇜
is the activation function that returns the maximum of 0 and the input
value.
Decoder (Sigmoid Activation):
The decoder reconstructs the original input x\mathbf{x}x from the latent space z\mathbf{z}z, using the
sigmoid activation function.
󰇛
󰇜
󰇛
 
󰇜
Where:


is the weight matrix for the decoder.

is the bias vector for the decoder.

󰇛

󰇜
is the sigmoid function, which outputs values in the range [0, 1].
Loss Function:
As before, we aim to minimize the reconstruction error between the original input x\mathbf{x}x and the
reconstructed output x^\hat{\mathbf{x}}x^. Since the output of the decoder is between 0 and 1 (due to
the sigmoid activation), the loss function is typically binary cross-entropy.
The binary cross-entropy loss is:

󰇛
󰇜
Further PCA(Principal Component Analysis) was used to reduce the dimensionality while
preserving the features. Dimensionality was essential in order to proceed further with high
resolution satellite imagery which had low Relative granularity resolution thus to reduce the
dimensions of image while preserving the key extracted features from autoencoding.
At last K Means was applied to cluster the extracted features with the K value of 3 representing
non NDVI, gram, wheat clusters in respective orders of their NDVI values.
Results and Accuracy: ARI or adjusted rand index was used to evaluate the models performance.
ARI = Index-Expected Index/Max index-Expected Index
The model was trained on an image which was labeled manually and the was iterated for another
9 times on the same image as the model reuses the learned features further refining its latent
space representation and clustering result.
This trained APK model was then trained on another NDVI image to perform clustering and was
evaluated through an ARI score with its actual labels.
The mean ARI score of the model when it was trained and iterated for 10 times was 82.3% and
later the ARI score with another image was 87.5% which indicates that the model is capturing
and performing well and is generalizing well and not overfitting neither underfitting.
References:
1. Lu, T., Gao, M., & Wang, L. (2023). Crop classification in high-resolution remote
sensing images based on multi-scale feature fusion semantic segmentation model.
Frontiers in Plant Science, 14, 1196634.
2. Mandal, U. K., Sarma, K. S., Victor, U. S., & Rao, N. H. (2002). Profile water balance
model under irrigated and rainfed systems. Agronomy Journal, 94(5), 1204-1211.
3. Ferral, A., Luccini, E., Aleksinkó, A., & Scavuzzo, C. M. (2003). Remote Sensing
Applications: Society and Environment.
4. Elavarasan, D., & Vincent, P. D. (2020). Crop yield prediction using deep reinforcement
learning model for sustainable agrarian applications. IEEE access, 8, 86886-86901.
5. Condran, S., Bewong, M., Islam, M. Z., Maphosa, L., & Zheng, L. (2022). Machine
learning in precision agriculture: a survey on trends, applications and evaluations over
two decades. IEEE Access, 10, 73786-73803.
6. Sayago, S., & Bocco, M. (2018). Crop yield estimation using satellite images:
Comparison of linear and non-linear models. AgriScientia, 35(1), 1-9.
7. Sabini, M., Rusak, G., & Ross, B. (2017). Understanding satellite-imagery-based crop
yield predictions. Stanford.
8. Olisah, C., Smith, L., Smith, M., Morolake, L., & Ojukwu, O. (2024). Corn yield
prediction model with deep neural networks for smallholder farmer decision support
system. arXiv preprint arXiv:2401.03768.